Statistical Named Entity Recognizer Adaptation

نویسندگان

  • John D. Burger
  • John C. Henderson
  • William T. Morgan
چکیده

Named entity recognition (NER) is a subtask of widely-recognized utility of information extraction (IE). NER has been explored in depth to provide rapid characterization of newswire data (Sundheim, 1995; Palmer and Day, 1997). The NER task involves both identification of spans of text referring to named entities, and categorization of these entities into classes based on the role they fill in context. The sentence “Washington announced that Washington ate seven hotdogs in Washington” provides an example in which a single name can arguably refer to three different entities: an organization, a person, and a location. Following the paradigm introduced by Ramshaw and Marcus (1999), many researchers reduce the NER problem to a word-tagging problem, and address it with techniques similar to those used for part of speech tagging (Meteer et al., 1991; Brill, 1995). Borthwick explores the maximum entropy approach in his dissertation (1999). Collins and Singer (1999) investigate semi-unsupervised methods for named entity categorization. Cucerzan and Yarowsky (1999) produce a unified technique for producing NER systems for several languages, utilizing extensive bootstrapping from small amounts of supervised data with an EMstyle algorithm. Miller et al. (2000) produce a statistical Hidden Markov Model (HMM) for NER which is similar to the one used by Palmer et al. (1999); the latter system, named phrag, is the NER engine utilized in the work described in this paper. The experiments described herein explore unsupervised approaches to NER, with an eye toward using unannotated corpora consisting of a few hundred million words. Recent word sense disambiguation results suggest that some simple techniques can scale well with increased data sizes (Banko and Brill, 2001). This paper presents several experiments in adapting a HMM-based named entity recognizer to a target data set. Our core learning engine is a wordbased HMM, and we show two techniques, informed smoothing and iterative adaptation, for incorporating unsupervised data into the model, which provide overall gains in performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Name Translation based on Fine-grained Named Entity Recognition in a Single Language

We propose named entity abstraction methods with fine-grained named entity labels for improving statistical machine translation (SMT). The methods are based on a bilingual named entity recognizer that uses a monolingual named entity recognizer with transliteration. Through experiments, we demonstrate that incorporating fine-grained named entities into statistical machine translation improves th...

متن کامل

Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain

In this paper, we explore how to adapt a general Hidden Markov Model-based named entity recognizer effectively to biomedical domain. We integrate various features, including simple deterministic features, morphological features, POS features and semantic trigger features, to capture various evidences especially for biomedical named entity and evaluate their contributions. We also present a simp...

متن کامل

Improving Information Extraction by Modeling Errors in Speech Recognizer Output

In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of m...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Hybrid, Three-stage Named Entity Recognizer for Tamil

The aim of this paper is to present the construction of a hybrid, three-stage named entity recognizer for Tamil. Named entity recognition performs an in-place tagging task for a given Tamil document in three phases namely shallow parsing, shallow semantic parsing and statistical processing. The E-M algorithm (HMM) is used in the statistical processing phase, with initial probabilities obtained ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002